-
Notifications
You must be signed in to change notification settings - Fork 111
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
docs: Eradicate unpaged SELECTs and warn about their drawbacks #1068
docs: Eradicate unpaged SELECTs and warn about their drawbacks #1068
Conversation
In paged.md such formulation was already rephrased in PR with paging API refactor.
We shouldn't encourage users to perform unpaged SELECTs. Normally, SELECTs should be paged, therefore the whole doc series on CQL data types is modified to use query_iter instead of query_unpaged.
We shouldn't encourage users to perform unpaged SELECTs. Normally, SELECTs should be paged, therefore the example in README is modified to use query_iter instead of query_unpaged.
We shouldn't encourage users to perform unpaged SELECTs. Normally, SELECTs should be paged, therefore the quickstart example is modified to use query_iter instead of query_unpaged.
|
NOTE: examples in |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In general it LGTM.
One thing I just realized is that paged.md
still contains the old performance disclaimer. I thought query_iter
vs the new execute_unpaged
had their own caveats.
Either way, I guess it is fine to follow-up somewhere else and it shouldn't block the existing work.
In let mut query = Query::from("SELECT * FROM ks.table");
// Query is not assigned any specific profile, so session's profile is applied.
// Therefore, the query will be executed with Consistency::One.
session.query_unpaged(query.clone(), ()).await?;
query.set_execution_profile_handle(Some(query_profile.into_handle()));
// Query's profile is applied.
// Therefore, the query will be executed with Consistency::Two.
session.query_unpaged(query.clone(), ()).await?;
query.set_consistency(Consistency::Three);
// An option is set directly on the query.
// Therefore, the query will be executed with Consistency::Three.
session.query_unpaged(query, ()).await?; I'd use |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
But otherwise, LGTM
Wouldn't |
A general thought: Now all SELECTs are recommended to be done using However, let's stick to this approach, because it guarantees correctness wrt paging. In GRER (Giant Request Execution Refactor) we'll think about redesigning iteration over pages so that it's not that expensive, so by 0.17 we should have this solved. |
I think it is fine. As long as we clarify the Performance section in paged.md as I pointed out earlier and discuss these points in depth there. Probably that's how you want to introduce the paged queries doc |
What disclaimer do you have in mind? All that I can see seem to be up-to-date. |
That's true. My only concern was that we use |
In examples refactor (that I'm currently working on), I decided to make use of |
Remember that even if the result is going to consist of a single row, in some situations (e.g. when there are a lot of tombstones - @fee-mendes please confirm) there can be a number of empty pages received before we finally get one with our expected row. |
Correct. If the query is paged, empty replica pages will kick in. If it is unpaged, then you retrieve the full result set. In general, it is hard to guarantee a SELECT will return only a single row, or a single page. |
I am reading paged.md which says:
Well, this doesn't seem to be accurate. In fact, it was never accurate.
Maybe rather than talking about Performance, we should just recommend what's better instead. Always prefer paged queries, from the driver's perspective use X as it performs better due to A, B, C. We also provide Y, which is slightly less performant on the client-side given D, E, F. // some warning disclaimer on unpaged // |
Right. This particular statement was never true wrt |
Well, be mindful though - that performance is subjective even in the Hence why I think the focus should shift from "Performance" to our recommended "Best Practices", with both client/server in mind. |
docs/source/queries/result.md
Outdated
|
||
> To sum up, for SELECTs that may return return a lot of data prefer paged queries, | ||
> e.g. with `Session::query_iter()` (see [Paged queries](paged.md)). | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe remove a remark about a lot of data
because of the tombstone problem?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Or maybe rephase it as (especially those that may return a lot of data)
?
New commit addressing @fee-mendes' comments: giant overview of statements and methods of executing them, together with best practices. |
349bc79
to
5b39d8f
Compare
Addressed Karol's comments. |
| Cluster load | potentially **HIGH** for large results, beware! | normal | normal | | ||
| Driver overhead | low - simple frame fetch | low - simple frame fetch | considerable - `RowIteratorWorker` is a separate tokio task | | ||
| Feature limitations | none | none | speculative execution not supported | |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm wondering if we should print warning / erorr query if speculative execution is set on execution profile used in {query ,execute}_iter.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm, that might be a good idea, but OTOH forcing using different profiles for iter and non-iter execution and thus being inconvenient (for speculative execution users).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I worry about situation where user wants to use speculative execution, but silently doesn't because of _iter
methods. You raised good point, I'm not sure how to best handle this possible issue.
| Suitable operations | - in general: operations with empty result set (non-SELECTs)</br> - as possible optimisation: SELECTs with LIMIT clause | - for advanced users who prefer more control over paging, with less overhead of `RowIteratorWorker` | - in general: all SELECTs | |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Isn't SELECT with limit still susceptible to tombstones problem?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| Suitable operations | - in general: operations with empty result set (non-SELECTs)</br> - as possible optimisation: SELECTs with LIMIT clause | - in general: all SELECTs | | ||
|
||
For more detailed comparison and more best practices, see [doc page about paging](paged.md). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ditto about LIMIT
We shouldn't encourage users to perform unpaged SELECTs. Normally, SELECTs should be paged, therefore caveat about that is added in two relevant places in docs.
This wasn't reverted when design regarding PageSize changed during code review.
In order to avoid API misuse, much knowledge is now shared in a structured way of tables, and best practices are described to aid users.
5b39d8f
to
8604a33
Compare
Addressed another round of @Lorak-mmk and @muzarski comments. |
Follow-up of #1061.
Motivation
Unpaged SELECTs should be discouraged due to the following drawbacks:
What's done
query_iter
instead,Note to reviewers
Please make sure (by reading both crate docs and the docs book) that in all places it is clear enough that SELECTs by default should be done using
query_iter
rather thanquery_unpaged
, and why so.Pre-review checklist
[ ] I added relevant tests for new features and bug fixes.[ ] I have provided docstrings for the public items that I want to introduce../docs/source/
.[ ] I added appropriateFixes:
annotations to PR description.